import pandas as pd
data1 = {
'A': [1,2,3],
'B': [4,5,6]
}
data2 = {
'A': [7,8,9],
'B': [10,11,12]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)Combining and Merging DataFrames with Pandas
This notebook is a guide to combining and merging DataFrames in Pandas, two fundamental operations for data wrangling. We’ll explore: - pd.concat(): For stacking DataFrames vertically or horizontally. - pd.merge(): For performing database-style joins on DataFrames.
1. Setting Up Sample DataFrames
First, let’s import Pandas and create a few sample DataFrames to work with.
df1| A | B | |
|---|---|---|
| 0 | 1 | 4 |
| 1 | 2 | 5 |
| 2 | 3 | 6 |
df2| A | B | |
|---|---|---|
| 0 | 7 | 10 |
| 1 | 8 | 11 |
| 2 | 9 | 12 |
2. Concatenating DataFrames with pd.concat()
Concatenation is like stacking DataFrames on top of each other (row-wise) or side-by-side (column-wise).
Default Concatenation (Row-wise)
By default, pd.concat() stacks DataFrames vertically. This is useful when you have DataFrames with the same columns.
pd.concat([df1, df2])| A | B | |
|---|---|---|
| 0 | 1 | 4 |
| 1 | 2 | 5 |
| 2 | 3 | 6 |
| 0 | 7 | 10 |
| 1 | 8 | 11 |
| 2 | 9 | 12 |
Column-wise Concatenation
You can also concatenate side-by-side by setting axis=1. This is useful for adding new columns.
pd.concat([df1, df2], axis=1)| A | B | A | B | |
|---|---|---|---|---|
| 0 | 1 | 4 | 7 | 10 |
| 1 | 2 | 5 | 8 | 11 |
| 2 | 3 | 6 | 9 | 12 |
3. Merging DataFrames with pd.merge()
Merging is used for database-style joins. It combines DataFrames based on a common column (a “key”).
Default Merge (Inner Join)
pd.merge() performs an inner join by default, combining rows that have matching values in the specified on column.
data3 = {
'A': [1,2,3],
'C': [13,14,15]
}
df3 = pd.DataFrame(data3)
df3| A | C | |
|---|---|---|
| 0 | 1 | 13 |
| 1 | 2 | 14 |
| 2 | 3 | 15 |
df1| A | B | |
|---|---|---|
| 0 | 1 | 4 |
| 1 | 2 | 5 |
| 2 | 3 | 6 |
pd.merge(df1, df3, on='A')| A | B | C | |
|---|---|---|---|
| 0 | 1 | 4 | 13 |
| 1 | 2 | 5 | 14 |
| 2 | 3 | 6 | 15 |
Summary
In this notebook, you learned the key differences between concat and merge:
pd.concat(): Best for stacking DataFrames.axis=0(default): Stacks vertically (appends rows).axis=1: Stacks horizontally (appends columns).
pd.merge(): Best for database-style joins based on common columns.- Use the
onparameter to specify the key to join on. - Supports inner, outer, left, and right joins (though we only covered the default inner join here).
- Use the
Understanding when to use each is crucial for effective data manipulation in Pandas. Happy coding! 🚀